We study the trade-off between storage overhead and inter-cluster repairbandwidth in clustered storage systems, while recovering from multiple nodefailures within a cluster. A cluster is a collection of $m$ nodes, and thereare $n$ clusters. For data collection, we download the entire content from any$k$ clusters. For repair of $t \geq 2$ nodes within a cluster, we take helpfrom $\ell$ local nodes, as well as $d$ helper clusters. We characterize theoptimal trade-off under functional repair, and also under exact repair for theminimum storage and minimum inter-cluster bandwidth (MBR) operating points. Ourbounds show the following interesting facts: $1)$ When $t|(m-\ell)$ thetrade-off is the same as that under $t=1$, and thus there is no advantage injointly repairing multiple nodes, $2)$ When $t \nmid (m-\ell)$, the optimalfile-size at the MBR point under exact repair can be strictly less than thatunder functional repair. $3)$ Unlike the case of $t=1$, increasing the numberof local helper nodes does not necessarily increase the system capacity underfunctional repair.
展开▼
机译:我们研究了集群存储系统中存储开销与集群间修复带宽之间的权衡,同时从集群中的多个节点故障中恢复。群集是$ m $个节点的集合,并且有$ n $个群集。对于数据收集,我们从任何$ k $集群下载全部内容。为了修复群集中的$ t \ geq 2 $节点,我们从$ \ ell $本地节点以及$ d $ helper群集获取帮助。我们描述了在功能修复下以及在最大存储量和最小群集间带宽(MBR)工作点的精确修复下的最佳权衡。 Ourbounds显示以下有趣的事实:$ 1)$当$ t |(m- \ ell)$时,折衷与$ t = 1 $下的折衷相同,因此联合修复多个节点没有任何好处,$ 2)$当$ t \ nmid(m- \ ell)$时,精确修复下MBR点的最优文件大小可以严格小于功能修复下的最优文件大小。 $ 3)$与$ t = 1 $的情况不同,增加本地帮助程序节点的数目并不一定会增加功能不足的系统修复能力。
展开▼